This characteristic of reinforcement learning must increase learning difficulty for intelligent system and learning time also grows up. 强化学习的这种特性必然增加智能系统的困难性,学习时间增长。
In regard to different situation, zero sum game and general sum game are separately adopted as reinforcement learning's frame and enhance the system stability by probabilistic incremental program evolution. 对于不同情况,分别将零和策略及一般和策略作为强化学习的理论框架,并且借助概率增量编程进化提高系统的稳定性。
The design of reward function is one of difficulties in building reinforcement learning system. 强化函数的设计是构建多智能体学习系统的一个难点。
A Hierarchical Reinforcement Learning Method with Variable Learning Rate for Soccer Robot Multi-agent Adversarial System 一种足球机器人多智能体对抗系统的分层变学习率增强式学习方法
Recently, Reinforcement learning is widely accepted in the application domain of distributed robotics as multi-agent system for the suited feature. 增强学习由于其独特的特点,在分布式机器人(RoboCup)中的应用越来越多。
The applications and problems adopting reinforcement learning algorithms in the multi-agent system. 多Agent系统中强化学习的应用和问题。
The implement method of the input module, reinforcement module, and policy module in reinforcement learning system are discussed. The working principle of reinforcement learning system is analyzed in respect of simple reinforcement learning system. 详细讨论了强化学习系统中的输入模块、输出模块及策略模块的实现方法,并结合一般的强化学习系统,分析了强化学习系统的工作原理。
The apply research of the reinforcement learning on basis of agent system 基于Agent系统再励学习的应用
In the behavior learning process of a robot, it is difficult to achieve a teaching signal for supervised learning, so reinforcement learning is attempted to deal with the behavior learning of a multi-robot system. 在机器人行为学习过程中,难于得到比较理想的监督学习的教师信号,因此该文尝试采用强化学习方法来解决多机器人的行为学习问题。
This method enhanced the reinforcement learning's applicability in real control system. 该方法进一步提高了强化学习理论在实际控制系统中的应用价值。
An improved reinforcement learning system is proposed to control the inverted pendulum, when the model of the inverted pendulum is not available and the agent has no a priori control knowledge. 在模型未知和没有先验经验的条件下,采用一种改进的强化学习算法实现二级倒立摆系统的平衡控制。
Based on reinforcement learning and dynamic programming algorithms which had existed, the paper presents an improved reinforcement learning system using double BP networks. 本课题在强化学习和动态规划算法的基础上,提出了一种基于双BP网络的强化学习系统。
New on policy modeless average payoff reinforcement learning algorithms are derived as stochastic approximation methods for solving the system of equations in average payoff Markov decision processes. 本文以随机逼近的形式,提出了一些用于求解平均奖赏Markov决策过程系统方程的在策略无模型激励学习算法。
Reinforcement Learning Technology in Multi-Agent System 多Agent系统中强化学习的研究现状和发展趋势
A set of optimised fuzzy control rules can be automatically generated through reinforcement learning based on the state variables of object system. 该控制器能根据被控对象的状态通过增强型学习自动生成模糊控制规则。
MAXQ, a hierarchical reinforcement learning method for multi-agent system, is proposed in recent years. MAXQ分层多智能体学习方法是近年来被提出的一种新方法。
It is necessary to partition state space into several regions and establish a reinforcement learning system with partitioning function. 分割状态空间为几个区域,建立具有分割功能的加强学习系统是必要的。
A reinforcement learning control method is presented for position control of an electrohydraulic servo system with uncertainties, where use was made of the cerebellar model articulation controller ( CMAC). 针对非线性电液位置伺服系统的不确定性控制问题,提出了一种带有小脑模型(CMAC)神经网络的再励学习控制方法。
The Architectures and Algorithm of Reinforcement Learning System 强化学习系统的结构及算法
Considering the instance that the learning space of a Reinforcement Learning in Groups ( RLG) system grows exponentially to the numbers of agents, a prediction-based RLG algorithm is presented. 针对群体强化学习系统的学习空间随着智能体个数的增加而指数级膨胀的问题,提出了一种基于预测的群体强化学习算法。
However, due to the theoretical limitation that it assumes that an environment is Markovian, traditional reinforcement learning algorithms cannot be applied directly to multi-agent system. 但是由于强化学习理论的限制,在多智能体系统中马尔科夫过程模型不再适用,因此强化学习不能直接用于多智能体的协作学习问题。
Reinforcement learning system and its learning algorithms for reliability optimization 强化学习系统及其基于可靠度最优的学习算法
This paper elaborates on the low learning efficiency in reinforcement learning due to improper generalization and random exploration policy under deterministic MDPS and proposes a hierarchical reinforcement learning algorithm based on system model. 针对强化学习算法的状态值泛化和随机探索策略在确定性MDP系统控制中存在着学习效率低的问题,本文提出基于模型的层次化强化学习算法。
Because the reinforcement learning system can learn from environment, it has no need of prior knowledge and is a form of non-tutor learning method, it has been widely used in artificial intelligence field. 由于强化学习能在与环境的交互中进行学习,且具有无需教师信号和先验知识的优点,其在人工智能领域的应用已越来越多。
Reinforcement learning plays an important role in memory system and Error-Driven in inference system. 强化学习主要对记忆系统发挥作用,错误驱动则对推理系统发挥作用。
How to introduce the intelligence technology into the application of satellite network, such as fuzzy logic, neural network, reinforcement learning and multi-agent system, is an important aspect in the improvement of sensor network. 如何将智能技术,如模糊逻辑、神经网络、增强学习和多智能体技术引入卫星网络应用中,提高卫星网络的自组织性、协同性和抗攻击能力是传感器网络技术发展的重要方向之一。
The middle managers can exchange the messages between each other to get the information of their sub-networks. Combined with the reinforcement learning, we get the optimized police in the limited period, thus causes the system reward maximization. 在多管理者管理的情况下,中间管理者之间可以交换响应消息,从而获得其他子网的统计数据,本算法结合强化学习模式,在有限阶段选择最优策略,从而使系统报酬最大化。
To increase the intelligence of the robot system to adapt the dynamic environment, by utilizing the self-learning, self-adaptability and memory ability of the immune system, the immunized reinforcement learning algorithm is proposed and applied to the robot system. 为了提高多机器人系统的智能性,更好的适应环境,利用免疫系统的自学习、自适应和免疫记忆特性,提出免疫强化学习方法,应用于机器人系统。
RLBS build search keyword information table and neighbor information table on the node, and use historical search results to build reinforcement learning model, so as to continuously optimize the follow-up resource discovery efficiency of the system. RLBS算法在节点上构建搜索关键字信息表及邻居节点信息表,以过往资源查找经验为指导,通过查找结果来构建强化学习模型,从而不断优化系统的资源查找效率。
It takes full advantage of ability of Relational Reinforcement Learning, and improves the adaptability of system. 充分利用关系强化学习的学习能力,提高了系统适应性。